<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!--
===================================================================================
 *  Copyright by KNIME AG, Zurich, Switzerland
 *  Website: http://www.knime.com; Email: contact@knime.com
 *
 *  This program is free software; you can redistribute it and/or modify
 *  it under the terms of the GNU General Public License, Version 3, as
 *  published by the Free Software Foundation.
 *
 *  This program is distributed in the hope that it will be useful, but
 *  WITHOUT ANY WARRANTY; without even the implied warranty of
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
 *  GNU General Public License for more details.
 *
 *  You should have received a copy of the GNU General Public License
 *  along with this program; if not, see <http://www.gnu.org/licenses>.
 *
 *  Additional permission under GNU GPL version 3 section 7:
 *
 *  KNIME interoperates with ECLIPSE solely via ECLIPSE's plug-in APIs.
 *  Hence, KNIME and ECLIPSE are both independent programs and are not
 *  derived from each other. Should, however, the interpretation of the
 *  GNU GPL Version 3 ("License") under any applicable laws result in
 *  KNIME and ECLIPSE being a combined program, KNIME AG herewith grants
 *  you the additional permission to use and propagate KNIME together with
 *  ECLIPSE with only the license terms in place for ECLIPSE applying to
 *  ECLIPSE and the GNU GPL Version 3 applying for KNIME, provided the
 *  license terms of ECLIPSE themselves allow for the respective use and
 *  propagation of ECLIPSE together with KNIME.
 *
 *  Additional permission relating to nodes for KNIME that extend the Node
 *  Extension (and in particular that are based on subclasses of NodeModel,
 *  NodeDialog, and NodeView) and that only interoperate with KNIME through
 *  standard APIs ("Nodes"):
 *  Nodes are deemed to be separate and independent programs and to not be
 *  covered works.  Notwithstanding anything to the contrary in the
 *  License, the License does not apply to Nodes, you are not required to
 *  license Nodes under the License, and you are granted a license to
 *  prepare and propagate Nodes, in each case even if such Nodes are
 *  propagated with or for interoperation with KNIME.  The owner of a Node
 *  may freely choose the license terms applicable to such Node, including
 *  when such Node is propagated with or for interoperation with KNIME.
===================================================================================
-->
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Data Representation in KNIME</title>
<link rel=stylesheet " type="text/css"
  href="../../../../../stylesheet_knime.css" />
</head>
<body>
<h1>Data Representation in KNIME</h1>
<h2>The DataTable Concept</h2>
<p>
KNIME uses <a href="../DataCell.html">DataCell</a>s to represent data
(each <a href="../DataCell.html">DataCell</a> holds one single entity, for
instance a floating point value). An array of <a href="../DataCell.html">DataCell</a>s
makes up a <a href="../DataRow.html">DataRow</a>. There are a couple of default
implementations of <a href="../DataCell.html">DataCell</a> that hold specific
types of data such as <a href="../def/StringCell.html">StringCell</a>, <a
  href="../def/IntCell.html">IntCell</a>, and <a href="../def/DoubleCell.html">DoubleCell</a>.
The entire data that is passed along nodes in the workflow is exposed in a <a
  href="../DataTable.html">DataTable</a>, containing a (generally unknown) number
of (equal-length) rows. All elements in a column must be compatible to the type
that is assigned to the column, e.g. they must all be numeric or all must be a
strings. The following figure sketches a <a href="../DataTable.html">DataTable</a>.
</p>
<p>
<img src="datatable.png" name="graphics1" alt="" /> <br />
</p>

<p>The meta information to a <a href="../DataTable.html">DataTable</a>
(number of columns, column specific information) is available through the <a
href="../DataTableSpec.html">DataTableSpec</a> (accessible via <a
href="../DataTable.html#getDataTableSpec()">DataTable.getDataTableSpec</a>)
which consists of a set of <a href="../DataColumnSpec.html">DataColumnSpec</a>s
comprising a specific name (a string), type (a <a href="../DataType.html">DataType</a>,
see below), and some (optional) properties such as domain information (minimum,
maximum, possible values), a color handler, and so on for each of the columns.
For further details on these specific classes see their class description and
the <a href="faq.html">FAQ</a> on how to use them.</p>

<p>
A <a href="../../node/BufferedDataTable.html">BufferedDataTable</a> is a special 
implementation of a <a href="../DataTable.html">DataTable</a>, which is used as
data structure to pass data from one node in the workflow to another. This has some
major advantages over using the generic <a href="../DataTable.html">DataTable</a> interface:
</p>
<ul>
    <li>Tables can be written to disc, reducing KNIME's memory footprint (in comparison to 
    generic <a href="../DataTable.html">DataTables</a> where KNIME has no control
    over memory usage).</li>
	<li>Similar to above: <a href="../../node/BufferedDataTable.html">BufferedDataTables</a>
	can be persistently saved, allowing to store and restore the entire workbench state.</li>
	<li>The implementation is approved, ensuring that certain properties hold, for instance
	there are no duplicate row keys and the length of each row (i.e. number of cells) is correct.</li>
	<li>There are different possibilities to instantiate a <a href="../../node/BufferedDataTable.html">BufferedDataTable</a>,
	some of which allow to simply append a column (or a set of columns) to an existing table (think of a node that
	does some calculation on the input table and appends the result to it). This gives generally
	a performance benefit as the unchanged part of the input table does not need to be written again.
	</li>
	<li>The instantiation of a <a href="../../node/BufferedDataTable.html">BufferedDataTable</a> is supervised (using 
	factory methods on the <a href="../../node/ExecutionContext.html">ExecutionContext</a>, which is passed as one
	of the arguments in the 
	<a href="../../node/NodeModel.html#execute(org.knime.core.node.BufferedDataTable[], org.knime.core.node.ExecutionContext)">
	NodeModel's execute method</a>), allowing KNIME to decide whether or not the table shall be written to disc 
	immediately or kept in memory.
	</li>
</ul>

<h2><a name="newtypes">New Types in KNIME</a></h2>
<p>KNIME allows to define customized types of data, e.g. a <a
  href="../DataCell.html">DataCell</a> that carries a representation of a
molecule. Specialized data types bring along their own renderer (e.g. to display
the molecular structure), a customized icon (that is displayed, for instance in
a column header to recognize the column's type), and a comparator. In order to
implement a new type in KNIME, generally you have to create two different
classes/interfaces:</p>
<p>
<ol>
  <li>An interface derived from <a href="../DataValue.html">DataValue</a>
  defining access method(s) to the generic objects. This will be the base
  interface to the new type and will also expose the meta information (renderer
  and such). Since Java does not allow to specify static members nor methods in
  interfaces (or any other class definition), KNIME will access a member in that
  interface with the following signature using reflection: 
  <pre>
  public static final UtilityFactory UTILITY
  </pre>
  The class <a href="../DataValue.UtilityFactory.html">UtilityFactory</a> has
  methods to retrieve specific information to this type implementation. If
  no such a member is provided, the reflection mechanism will use the
  information from the super interface, i.e. DataValue (though, no customized
  information is available in this case). It is highly recommended to subclass
  <a href="../ExtensibleUtilityFactory.html">ExtensibleUtilityFactory</a> instead of
  implementing the <a href="../DataValue.UtilityFactory.html">UtilityFactory</a> interface.
  Usually the special utility factory is defined as an inner class
  of the interface. The new interface should be similar to: 
  <pre>
  public interface MyDataValue extends DataValue {
    
    /** Derived locally. */
    public static final UtilityFactory UTILITY = new ExtensibleUtilityFactory() {
       ...
    };
    
    /** The interface methods. */
    MyValue getMyValue();
 }
 </pre>
 </li>
  
  <li>A class derived from <a href="../DataCell.html">DataCell</a> and
  implementing the interface defined in 1. and any <a href="../DataValue.html">DataValue</a>
  interface to which your new type should be compatible to (for instance our new
  molecule type should also be able to return a simple string representation of
  the molecule – it needs therefore to implement 
  <a href="../StringValue.html">StringValue</a>). The first time the new 
  <a href="../DataCell.html">DataCell</a> is instantiated, KNIME will parse the
  list of implemented <a href="../DataValue.html">DataValue</a> interfaces and
  make the list of compatible <a href="../DataValue.html">DataValue</a>s
  available through the DataCell's <a href="../DataType.html">DataType</a>.
  
  You associate your DataCell implementation with your newly created DataType by registered the cell implementation
  at the extension point <code>org.knime.core.DataType</code> (see the description of the extension point for details). 
  There are two important issues when defining a new <a href="../DataCell.html">DataCell</a>:

  <ul>
    <li><b>Preferred value class</b>: A <a href="../DataCell.html">DataCell</a>
    may potentially implement more than just one single 
    <a href="../DataValue.html">DataValue</a> interface. Take the new molecule type from before as an example, which
    is compatible to the new <tt>MyDataValue</tt>tt> interface but also to 
    <a href="../StringValue.html">StringValue</a>. Both values support a
    customized renderer: Which renderer should KNIME use by default? To
    overcome this situation, KNIME will use the <em>first</em> implemented DataValue interface as “preferred” value
    class. 
    
    <li><b>DataCellSerializer&lt;MyCell&gt;</b>: KNIME frequently buffers
    data to hard disc since <a href="../DataTable.html">DataTable</a>s can get
    arbitrarily big. By default it uses Java-serialization (the base <a
      href="../DataCell.html">DataCell</a> implements <code>java.io.Serializable</code>.
    Since standard serialization is slow, we support reading and writing
    through an appropriate factory, called 
    <a href="../DataCellSerializer.html">DataCellSerializer</a>. In case you implement your custom serializer (which
    you reall should!), it must be registered as "serializer" for the specific cell implementation at the extension
    point <code>org.knime.core.DataType</code> (see the description of the extension point for details).
     
    Note that your serializer must have the correct return type, i.e. the
    generic parameter of the <a href="../DataCellSerializer.html">DataCellSerializer</a>
    must be your <a href="../DataCell.html">DataCell</a> class.</li>
  </ul>
  </li>
</ol>


<h2><a name="typecollection">Collection of DataTypes</a></h2>
<p>Documentation is pending... For a quick overview refer to the  
implementations of <a href="../collection/CollectionDataValue.html">
CollectionDataValue</a> and <a href="../collection/ListCell.html">
ListCell</a>.
</p>
</body>
</html>
